Unstructured intermediate states in single protein force experiments 
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Recent single-molecule force measurements on single-domain proteins have highlighted a three- 
state folding mechanism where a stabilized intermediate state (X) is observed on the folding trajec- 
tory between the stretched state and the native state. Here we investigate on-lattice protein-like 
heteropolymer models that lead to a three-state mechanism and show that force experiments can 
be useful to determine the structure of L We have mostly found that J is composed of a core sta- 
bilized by a high number of native contacts, plus an unstructured extended chain. The lifetime of X 
is shown to be sensitive to modifications of the protein that spoil the core. We then propose three 
types of modifications -point mutations, cuts and circular permutations- aiming at: 1) confirming 
the presence of the core and 2) determining its location, within one amino acid accuracy, along the 
polypeptide chain. We also propose force jump protocols aiming to probe the on/off-pathway nature 
of X. 

Keywords: single molecule experiments — protein folding — kinetic intermediates — unstructured proteins 
— on-lattice models 



The recent development of single molecule experimen- 
tal tools P, [3, 3 has allowed to investigate the funda- 
mental biochemical and biophysicalprocesses occurring 
at a molecular level inside the cell [J|. For instance, the 
folding of proteins 0, 0, Q can nowadays be studied 
by manipulating one protein at a time Jjl . Examples are 
the titin molecule pulled by AFM or the E. coli 

155-residues RNase H protein [l^, pulled by optical 
tweezers At low denaturant concentration, FRET 

measurements have shown the presence of highly com- 
pact denaturated states [H, [l^ whose existence was ex- 
pected from previous bulk experiments p^ . The latter 
suggests a hierarchical folding mechanism where the fold- 
ing of the protein to the native state is preceded by a fast 
collapse of the most stable region of the native structure. 
The formation of a structure that has a short lifetime and 
many native contacts has been observed during the fold- 
ing of many single-domain proteins [l^ . On the other 
hand, recent experiments using optical tweezers have in- 
vestigated the unfolding/folding transition of the RNase 
H protein under the action of a mechanical force applied 
at the two ends of the molecule [ll| . These experiments 
show the stabilization of an intermediate state at forces 
around 5pN 11[ . The protein is observed to exist in three 
different states: the stretched (S), the intermediate (T) 
and the native (A/") states^. Using thermodynamic con- 
siderations it has been argued that I is identical to the 
early state (£) that forms at zero force and room tem- 
perature [11| . The experimental results also suggest that 
X is an obligatory step in the folding pathway from S to 
TV, hereafter referred to as an intermediate on-pathway. 

The determination of the structure of generic unstruc- 
tured states, i.e. that lack a well-structured three- 



dimensional fold, is a major experimental challenge in 
modern biophysics. A well-known example is the molten- 
globule state sometimes observed in thermal denatura- 
tion in proteins The identification of the unstruc- 
tured states is limited by their large structural fluctua- 
tions that make usual techniques (X-ray or NMR) poorly 
predictive. On the other hand, growing evidence shows 
that a large number of proteins are intrinsically unstruc- 
tured and contain a fair amount of disordered regions 
[l6t . The use of new experimental techniques aiming to 
probe unstructured states is therefore a question of great 
interest. 

Is there any connection between the intermediate 
states that have been detected in AFM and optical tweez- 
ers experiments and the intrinsically disordered states ob- 
served in many proteins? Is it possible to extract useful 
information about the structure of the intermediate state 
observed in single molecule pulling experiments by de- 
signing specific experimental protocols?. To address such 
questions we use a phenomenological approach based on 
the numerical investigation of on-lattice heteropolymcrs 
in the presence of mechanical force [l7[. This class of 
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models contain the minimal number of ingredients nec- 
essary to capture the basic phenomenology (thermody- 
namics and kinetics) of the folding transition problem. 
In addition, they are simple enough to allow exhaus- 
tive statistical studies that are difficult to carry out with 
other more accurate and realistic descriptions of pro- 
teins. In contrast to simple two-state models, on-lattice 
heteropolymcrs are phenomenological models where the 
molecular extension that reflects the internal configura- 
tion of the protein is the natural reaction coordinate . 

By introducing mechanical force in the analysis p^ . 
we show how it is possible to reproduce and interpret the 
three-state behavior observed in the experiments. We 
numerically investigate several topologies of the native 
structure and find that they generally lead to a three- 
state scenario in the presence of mechanical force. The 
new intermediate state (T) is typically composed of a 
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FIG. 1: Kinetic scheme of a folding reaction with different 
types of intermediate states. X2 is off-pathway (misfolded) 
since it is not directly connected to M whereas T\ is on- 
pathway. 




FIG. 2: Two archetypal native topologies of designed het- 
eropolymers on a cubic lattice. Left: Structure Si, A*' = 27. 
Right: Structure S2, = 36. The numbers indicate the posi- 
tions n of the monomers along the chain. The crosses indicate 
the two ends of the chain. 



compact core with a high number of native contacts, plus 
an unstructured extended chain. Moreover, I is not nec- 
essarily identical to the early state (£) that forms when 
folding at zero force. 

We then show how the structure of the intermediate 
state X, that has been observed in single molecule pulling 
experiments, can be determined by means of specific ex- 
perimental protocols that have been used in protein bio- 
physics in different contexts. We propose experimental 
single protein force protocols that introduce modifica- 
tions in the amino acid sequence of the protein to infer 
information about the structure of I. We propose three 
techniques based on i) single amino acid mutations, ii) 
cutting off the polypeptide chain at various lengths and 
iii) circular permutations of the protein. These tech- 
niques lead to the location of the core due to the fact 
that the system S ^ I undergoes a transition when the 
modifications involve amino acids of the core. These pro- 
tocols could be also used in the future to unveil the local 
structure of globally unstructured proteins that contain 
a mixture of disordered and ordered regions [Toj . 

Finally, by investigating the folding kinetics at differ- 
ent solvent conditions, we have also found the presence 
of other intermediate states that, we show, are misfolded 
states (Fig. [1]). In contrast with on-pathway states, mis- 
folded states are off-pathway: starting from such state, 
the folding pathway to M must pass through S. Al- 
though off-pathway and on-pathway states may be hard 
to distinguish (e.g. when they have the same molecular 
extension), we show that a force jump protocol is useful 
to quantify the fraction of on/off-pathway trajectories 
that lead to on/off pathway states respectively. 

I. THREE-STATE PROTEINS 

Following the sequence optimization procedure of 
Shakhnovich and Gutin we design heteropolymers 
on a cubic lattice that fold into a unique compact struc- 
ture (Fig. [2). The heteropolymer consists of a chain of 
monomers indexed by i (1 < i < TV) with nearest neigh- 



bor pair interactions Eij between monomers i, j that are 
not contiguous along the chain. The values of E^, which 
determine the native configuration, are obtained follow- 
ing an optimization algorithm starting from an ini- 
tial set of interactions i?? and a given topology of the 
native structure, i.e. a given chain configuration in A/". 
Wc note that, by definition of the model, several sets 
of interactions Eij can be associated to identical topolo- 
gies of the native state. The values of E^^, and hence 
of Eij , are drawn from a Gaussian distribution of zero 
mean and variance A^. A is measured in units of ksT* 
where Ub is the Boltzmann constant and T* is a ref- 
erence temperature that we fix to 300K. The dynam- 
ics of the heteropolymer consists in the standard "coin 
and crankshaft" Monte-Carlo dynamics with Metropolis 
rates [20j (elementary moves are shown in the illustra- 
tion). Note that these types of moves might not be opti- 
mally suited for pulling experiments since they transmit 
stress very slowly over long straight chains. However, we 
still expect that the generality of our results goes beyond 
the details of the local dynamics we use for the on-lattice 
heteropolymers. 
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The timescalc is fixed by the unit of Monte-Carlo steps 
that we set to 100 ns, a value that leads to results in quan- 
titative agreement with experimental results (e.g. [ll|). 
In this type of model, the values of Eij correspond to spe- 
cific short-range tertiary contacts along the protein chain. 
Although long-range interactions, side-chain interactions 
and other short-scale details of proteins (such as the sec- 
ondary structural motifs) are not included in the model, 
such designed heteropolymers have been shown to dis- 
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FIG. 3: Free energy profiles projected along r^j^jj and Q for 
the structure S2. A = 1.2 ksT* and / = 9 pN. Three main 
states can be defined: the native state (A/"), an intermediate 
state (X) and the stretched state (S). They correspond to 
the deeper local minima along the free energy profiles. The 
values for r^,^^ and Q have been averaged over lO/is. 
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play folding properties that are similar to those of single- 
domain proteins [l^ . The results we show here are quite 
general and have been reproduced with different native 
structures. However, for the sake of clarity, throughout 
this paper we present results for two archetypal compact 
structures Si, S2 (Fig. ^ whose sizes are respectively 
N = 27, 36. These correspond to small globular proteins 
with a number of residues in the range of 50 — 100 plj . 

To characterize the state of the heteropolymer, we 
monitor the temporal evolution of the end-to-end dis- 
tance rend of the molecule and the percentage of native 
contacts Q {0 < Q < 1). The lattice spacing is set equal 
to 1 nm for the heteropolymer to have contour lengths 
that are similar to those of proteins studied in experi- 
ments (e.g. dH). A state is defined as the location of 
a minimum in the free energy projected along Q or rend 
(see Fig. [3]). Due to the discrete nature of the on-latticc 
heteropolymer, the free energy landscape along rend is a 
rugged surface (see for instance Intermediate states 

then appear as highly roughed basins that can be better 
identified by ensemble or time averaging of the values of 
rend and Q over a finite bandwidth. In this way, we ob- 
tain smooth free energy landscapes in space and Q with 
well defined minima (see Fig. Small single-domain 
proteins are commonly described as two-state systems 
havingtwo possible conformations: native and dcnatu- 
rated [5|. In experiments, by varying the concentration 
of denaturant one finds a first-order like transition where 
both states coexist @ -see however [2^ for exceptions to 
this general result. In the presence of applied mechanical 
force, cooperative transitions take place between the na- 
tive state TV and a stretched state S as observed in single 
molecule AFM measurements in engineered polyprotcins 
[2^ and in RNA pulling experiments using optical twcez- 



FIG. 4: Upper left: Three-state behavior in S2. A = 
l.SSfcflT* and / — 10.1 pN. Upper right: Typical structure, 
composed of a core plus an extended chain, of a configura- 
tion in T. Lower panel: Experimental trace of the RNase H 
protein at constant force (/ ~ 6 pN) using optical tweezers 
(taken from [lU'l. 



ers [25|. In order to introduce mechanical force in the 
lattice we must avoid the lattice anisotropy effects that 
act as kinetic traps for the rotational degrees of freedom 
p^ . To this end, we add a term of the type — /• rend 
where / is a force of constant modulus (measured in units 
of A/a) that is always aligned with the end-to-end vec- 
tor rend of the heteropolymer [Toj - see supplementary 
material. We have verified that a two-state system un- 
der the action of a (moderate) mechanical force leads to 
an exponential folding/unfolding times distribution (Fig. 
SI and S2 in the Supp. Mat.). 



Intermediate and misfolded states 

Starting from S and by further decreasing the force 
down to zero, a single-domain protein shows a coopera- 
tive transition to A/" at a given value of the force. Our 
simulations show that several structures that exhibit a 
two-state behavior in a given range of temperatures, also 
show a three-state behavior under the action of mechani- 
cal force at lower temperatures - or equivalently at larger 
values of A at the fixed temperature T* . In Fig. 01 we 
show the three-state behavior by plotting the temporal 
evolution of both rend and Q, starting from a random ini- 
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FIG. 5: Force-extension curve for S2 with A = 1.2 ksT* 
in a pulling protocol with different loading rates r. Left, 
r = 4pN.s~^. Right, r = 0.4pN.s"^. We have included 
the contribution of the handles (modeled as a freely jointed 
chain) . The total extension (protein plus handles) is equal to 
X = '"cnd + a;FJC where ipjc = 100 coth(/a/fcsr) — ksT/ fa 
corresponds to the extension of the freely jointed chain at 
force /. At large loading rate r, the unfolding transition 
A/" <S is of the all-or-none type [ll|] whereas at lower load- 
ing rates (right panel), the intermediate state T along the 
transition (blue circle) can be resolved. At low rates, we also 
observe multiple transitions between S and T during the re- 
folding (black dashed circle). 




FIG. 6; Example of a structure (A'^ = 36) for which we have 
not observed any three-state mechanism under any condi- 
tions. 
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tial configuration. The three-state mechanism has been 
observed for different matrices i.e. for different en- 
ergy values and different topologies of the native struc- 
ture. 

Fig. |4] shows that the final folding stage takes place 
from T suggesting that I is on-pathway. Sometimes, how- 
ever, the transition from 5 to does not go through T 
(See Fig. S3). Although this transition is rare, it clearly 
shows that the folding pathway is non unique. By re- 
peatedly pulling and relaxing the protein at loading rates 
equivalent to those used in the experiments [ll|, we ob- 
serve an all-or-none imfolding transition of A/" (Fig. O. 
At much lower loading rates, we observe large fluctua- 
tions in the molecular extension due to the presence of X 
(F ig. Et , a result that is consistent with the simulations 
in [1^ . These features of the force-extension curves could 
be checked in future single molecule experiments. 

The native topology. For each heteropolymer we 
have determined the topology of T, i.e. the configuration 
of the chain in the state T. Remarkably, we have always 
obtained a state composed of a compact core whose con- 
tacts are mainly native plus a chain that is extended and 
hence that has few native contacts -see Fig. HJ [7] and O 

Next, we have investigated the folding mechanism for 
different matrices of energies Eij that keep the same na- 
tive topology, i.e. the same chain configuration in Af. In 
most of the cases, we find a three-state behavior where X 



FIG. 7: Example of a structure for which the kinetic barrier 
between I and J\f is smaller than that between J and S. In 
this case, during the unfolding transition (A/" — > S), we can 
observe a transient regime preceding the transition where the 
molecule switches between A/" and J. A = 1.09 fcsT* and / — 
9.3 pN. The leftmost lower figure shows a typical configuration 
of the heteropolymer chain in J. 

shows a structure formed by the same compact core plus 
an extended random coil (see also Fig. S4 in Supp. Mat.). 
Because the core, and hence I, is identical for all cases, 
this suggests a strong correlation between the three-state 
behavior and the topology of the native structure, inde- 
pendently of the precise values of the energies Eij . In ad- 
dition, we have checked that some topologies never lead 
to the formation of an intermediate state. As an example, 
the structure shown in Fig. [Hldoes not lead to a three- 
state mechanism for any combination of temperature and 
force values. However, we are not able to give the feature 
list that must verify a native structure in order to show 
a force induced three-state behavior in a given range of 
temperatures. In contrast, as we shall see below, we have 
found several different structures that show three-state 
behavior at sufficiently low temperatures. 

A versatile intermediate state The experiments 
on RNase H and our simulations using S2 suggest 
that the free energy barrier separating X and TV is higher 
than the free energy barrier separating X and S (Fig. [3]). 
This explains that in some range of force and temperature 



5 




the correlation coefficient for the variation of m{n) along 
the chain, S{n) = m{n + 1) — m{n). These are defined 
by: 

N mi{n)me{'n) — nij (n) nie (n) 



FIG. 8: Average number m of native contacts as a function 
of the position n of the monomer along the chain. The his- 
tograms (in red) correspond to the number of native contacts 
in the early state {£) whereas the blue dashed lines correspond 
to the number of native contacts in T at (a) A = 1.43 ksT* , 
f = 10.6 pN for Si and (b) A = 1.2fcsT*, / = 9 pN for S2. 



the folding transition to J\f starting from S is preceded by 
a transient regime where the molecule switches between S 
and I -see Fig. 2] We have found other scenarios where 
the free energy barrier between Af and T is smaller than 
that between S and I. In this case we observe, at some 
force and temperature values, a behavior symmetric to 
the previous one, i.e. a switching behavior between Af 
and X that precedes the unfolding transition from TV to 
S -see Fig. H 

The intermediate state v^rith and v^fithout force. 
We have investigated whether T corresponds to the early 
compact structure £ that forms, starting from a random 
initial configuration, during the folding at zero force and 
at the same temperature. We find that sometimes both 
states are correlated, whereas in other cases they are not. 

At zero force, I is not well-defined since it is not a local 
minimum along Q or Tend- As a consequence, we have 
used a heuristic method to determine the state £ that has 
to be compared with X. The procedure is based on the 
fact that, in average, Q monotonically tends to 1 during 
the folding transition. Therefore, for a given random 
initial condition (Q ~ 0), during one folding trajectory 
at zero force, we record the first configuration that has a 
value of Q identical to the value of Q in X. We then define 
the state £ as the ensemble of these first configurations 
that are obtained by sampling different random initial 
conditions and different noise histories. In this ensemble 
of configurations, we compute the average number m of 
native contacts for a given monomer as a function of its 
position n along the chain. The distribution m{n) is then 
compared to that obtained for X. The results obtained 
for the structures Si and S2 (Fig. [H]) suggest two types 
of distributions. In Fig. [SJa, (structure Si) the states £ 
and X are highly correlated whereas in Fig. [8]d (structure 
S2) £ and X seem to be uncorrelated with each other. 

Quantitatively, we measure the correlation between the 
two structures by computing i) Xm, the correlation coeffi- 
cient (also called the Pearson's correlation coefficient) for 
the average number of native contacts m{n) and ii) xs 



k—i,e 
k—i,e 

The sub-indexes i and e refer to the states that we are 
comparing, i.e. X and £. The sums in Xm run over all 
the monomers n ~ 1, .., N whereas the sums in xs mn 
over all the — 1 first monomers. Values of correlation 
coefficients close to reflect a low correlation between 
the structures whereas values close to 1 reveal large cor- 
relations. Negative values indicate anticorrelation. 

We have then investigated, as exhaustively as possi- 
ble, the native state properties that lead to a similarity 
between X and £. In Fig. [9l we report four examples 
of such structures that show a three-state behavior and 
that have different degrees of similarity between X and 
£. 

Why we expect the states X and £ to differ? First of 
all, during the folding at zero force the monomers tend 
to form native contacts independently of their position 
along the polymer chain. In contrast, the intermediate 
state with force has the core/extended-chain structure 
described above that is energetically favored due to the 
stretching effect of the force. Despite of this difference, 
most of the structures we investigated show a strong cor- 
relation {xs biased towards 1) for the variation S{n) of 
the average number of native contacts (see Figs. [51 [51). 

Next, we have observed that monomers in X tend to 
locally form crankshafts (see Figs. [31 and [SI see also 
the illustration shown at the beginning of Section I for 
the shape of a crankshaft). Moreover, structures with a 
high similarity between X and £ also show a fairly high 
content of monomers that form crankshafts in J\f (see for 
instance the structures S2 in Fig. [2land the structures in 
Fig. [5^, Fig. [5]d and Fig. ^i). A crankshaft arrangement 
of monomers reflects the formation of non-covalent bonds 
between monomers that arc close to each other along the 
polymer chain. From the point of view of real proteins, 
this would suggest that the interaction of sub-units that 
are close to each other along the amino acid chain is 
necessary for X and £ to be similar. 

On and off-pathway states. The extension trace 
of Fig. [31(see also Fig. S5 and Fig. S6 in Supp. Mat.) 
shows that the last folding step starts from X. A sim- 
ilar observation has led Cecconi et. al to argue that X 
is on-pathway. However, we cannot discard the possibil- 
ity of the presence of additional intermediate states off- 
pathway having the same molecular extension, i.e. mis- 
folded states (Fig. [1]). We then propose the following 
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FIG. 9; Four examples of structures showing a three-state 
behavior and having different degrees of correlation between 
T and E. For each figure, the upper panel on the left shows 
the native structure whereas the lower panel on the left shows 
a typical configuration of the heteropolymer chain in X. Fig 
Hi: A = 1.09 fcsT* and / = 7.6 pN. FiglHJj: A = l.OgfcijT* 
and / = 6.7 pN. Fig[9};: A = LUksT* and / = 7.5 pN. Fig 
[li: A = 1.09A:bT* and / = 7.2 pN. 



experimental force jump protocol to detect and quantita- 
tively measure the fraction of misfolded states. Each time 
the system folds into T, we relax the force to zero and 
compute the distribution of folding times. In the presence 
of misfolded states, one should get a bimodal distribution 
composed of a short-time contribution corresponding to 
on-pathway states and a long-time tail corresponding to 
off-pathway states. Indeed, misfolded states are expected 
to be separated from Af by high energy barriers that slow 
down the folding dynamics leading to large folding times 

We have carried out numerical simulations of this force 
jump protocol (i.e. we relax the force to zero once the 
system has a number of contacts corresponding to X) in 
two cases: when only on-pathway intermediate states arc 
present and when a mix of on-pathway and off-pathway 
states are present. In most cases we studied we found 
that X was on-pathway. A convenient way to generate 
misfolded states is to consider a structure showing only 
on-pathway states and then add solvent conditions that 
favor the formation of misfolded states. Wc include the 
effect of hydrophobic interactions between the amino acid 
side chains of a protein and the water molecules in so- 
lution by introducing an additional energy term Ch for 
each interaction between a molecule of the solvent (cor- 
responding to a free node on the lattice) and a monomer. 
The overall energy contribution for a monomer is pch 
where p is the number of nearest neighbor free nodes 
of that monomer, > favors hydrophobicity by in- 
creasing the interactions between the monomers. For the 
sake of simplicity, we have taken a single value of 
for all monomers. However, one could do more general 
and introduce a value of Ch for each individual monomer 
by adding specific (positive or negative) contributions to 
control the degree of hydrophobicity of each monomer. 
The latter procedure has been used to model the effect 
of a denaturant on the folding transition [27j . 

Fig. [in] reports the distribution of folding times for Si 
that shows the presence of misfolded states. Without hy- 
drophobicity (e/i = 0), the temperature and force values 
used are such that there are only on-pathway states, thus 
leading to a smooth monotonic distribution of folding 
times. By adding hydrophobicity, i.e. Ch > 0, monomers 
tend to interact more with each other. Although the tem- 
poral evolution of rend is similar to that observed in the 
Cfi = case, we actually obtain a mixture of on-pathway 
and off-pathway states. The former contributes to the 
short time distribution of Fig. [TO] whereas the latter cor- 
responds to the contribution at very large times. By sep- 
arately integrating out each part of the distribution, we 
are able to measure the fraction of on/off-pathway tra- 
jectories that lead to on/off pathway states respectively. 
For the example shown in Fig. [TOl we get 42% ± 5% and 
58% ± 5% of on/off-pathway folding trajectories respec- 
tively. Force jump protocols could be implemented in 
optical tweezers and AFM experiments to measure the 
fraction of on/off-pathway folding trajectories. 
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FIG. 10: Left panel: Distribution of folding time r/ after 
setting tlie initial force to zero once X is reached in the absence 
of hydrophobic effects for the structure Si with A = 1.7 ksT* 
and / = 13.2 pN. In this case, we observe only on-pathway 
states. Right panel: Same distribution of folding time r/ 
but in the presence of hydrophobic effects that lead to off- 
pathway intermediate states. The values are A = 1.67 fesT*, 
/ = 13.2 pN and eh = 0.5 ksT. The rightmost vertical bar 
counts for trajectories that have r/ > 20 s. Because the size 
of the systems we run our simulations is small, when I is 
reached we constrain the system to keep a number of native 
contacts larger than those in I. In this way we reduce finite- 
size effects and obtain a clear separation of timescale between 
on-pathway and off-pathway states. 




n=12 




FIG. 11: Point mutation protocol. Left: Unstructured inter- 
mediate configuration. The star indicates a mutated amino 
acid. Right: ku a function of the location n of the mutation 
along the chain for different values of the hydrophobicity of 
the mutated monomer, e*. The structure is S2, A = 1.2 fcsT* 
and / = 9 pN. The dashed line indicates the position of the 
first monomer (n/ = 19) inside the core. The core is com- 
posed of all the monomers that follow up from that monomer 
until the end of the chain. 



II. EXPERIMENTAL PROTOCOLS AND THE 
INTERMEDIATE STATE 

Determining the structure of non-native states of nu- 
eleic acids and proteins remains a major experimental 
challenge in modern biophysics. For instance, even the 
structure of the denatured state of the lysozyme protein 
that has been studied over half a decade is still unresolved 
[2^. In this regard, the use of single- molecule techniques 
appears as a promising tool to identify kinetic pathways 
and intermediate states In this section, we propose 
specific protocols in single molecule pulling experiments 
aiming to determine, within one amino acid accuracy, 
the location of the core in proteins with an intermedi- 
ate state. Implementation of these protocols require well 
known methodologies in protein biophysics. 

A useful method to determine the structure of X con- 
sists in measuring the unfolding/folding kinetic rates, 
and kf, associated to the transition I <-> 5 after modify- 
ing the protein in various ways. These rates are obtained 
by recording, at a given force, the molecular extension 
of the protein and measuring the inverse of the average 
residence time of the protein in each state X and S (see 

Fig. m). 

A "0- value" force protocoL A possible modifica- 
tion of the protein consists in selectively mutating an in- 
dividual amino acid. The idea of this method is reminis- 
cent of the (/)- value technique used in bulk measurements 
[2^ . In our case, we consider a heteropolymer where 
initially e\ = for all i. We then select one monomer 
i and assign new values for the interaction energies Eij 



between that monomer and the other monomers j of the 
chain. We also increase the degree of hydrophobicity of 
that monomer i by setting ej^ = e* > while keeping 
the rest of the ej/s equal to zero. In Fig. [11] we report 
for S2 the values of fc„ as a function of the location n 
of the mutation along the chain and for different values 
of e*. One can clearly see a transition separating low 
and high rates that is distinctly located at the edge of 
the core. Low rates correspond to mutations on the free 
chain whereas high rates correspond to mutations inside 
the core. The larger e^, the sharper the transition, which 
suggests the use of very hydrophilic amino acids, such as 
serine or threonine as point mutations. 

From an experimental point of view, a single mutation 
may not be sufficient to distinguish a transition because 
of the too small differences in the rates. We then suggest 
a multiple-points mutation analysis: instead of mutating 
a single amino acid, two or more successive amino acids 
can be mutated. This helps to identify more clearly the 
transition but also leads to a less precise location of the 
position of the edge of the core (Fig. S7 in Supp. Mat.). 

Cutting the proteins. According to Kramers-Bell 
theory, the transition rates between X and S depend ex- 
ponentially on the applied mechanical force, showing a 
chevron-like shape @ - see Fig. [T^l In these kind of plots 
we represent the unfolding (fc„) and folding (kf) rates as 
a function of force. Therefore, the increasing (respec- 
tively decreasing) curves correspond to the dependence 
of the rates fc„ (respectively kf) oi the transition X ^ S 
(respectively 5 — > 2"). The crossing point of a chevron 
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FIG. 12: Cutting protocol. Left: Unstructured intermediate 
configuration. The black dashed circle indicates the cut in 
the chain. Cutting at the monomer n along the heteropoly- 
mer means "removing the monomers 1, 2, .., n-1". Right: 
We report the values of the "unfolding" and "folding" rates 
for the reaction X ^ 5 as a function of the applied force 
/, which leads to the so-called chevron plots. The different 
chevron plots correspond to different locations of the cutting 
of the heteropolymer. The structure is S2 with A = 1.2 ksT* . 
Insets: Critical forces as a function of the cutting position 
n. The vertical dashed line marks the position of the first 
monomer (n/ = 19) that separates the core from the ex- 
tended chain. We find that, at 71/, the value of the criti- 
cal force (where unfolding and folding rates are equal) sud- 
denly drops (compare the chevron plots for n — 19(black) and 
n = 20(red)). 



plot {ku = kf) is located at the value of the force where 
both rates {2 S and 5 — > Z) are identical. This is 
the critical force where the two species (X and S) are 
equally probable. We have measured these rates in Si 
and S2 after cutting off the extremities of the chains at 
certain locations, i.e. leading to a shorter polypeptide 
chain. Fig. [T^] shows the chevron plots for S2 as we keep 
the extremity fixed at one end of the core and progres- 
sively reduce the length of the chain. We see a sharp 
transition, characterized by a drop of the critical force 
(insets of Fig. \T2^ . when the cut is done inside the core. 
In this case, T looses its stability because of the spoiling 
of the core, a rather intuitive result. This allows again to 
locate the core with one monomer accuracy. Other simi- 
lar modifications where protein interactions are changed 
are shown in Fig. S8 in the Supp. Mat.. 

Circular permutations. Circular permutations are 
useful modifications that allow to investigate the stabil- 
ity of native structures. In this case, new polypeptide 
chains are obtained by shifting all the amino acids in the 
original chain by a certain amount a. An amino acid at 
the position i will then go to the position i + a (modulo 
the number of amino acids in the protein) where a can be 
positive or negative. We have measured the rates fc„ and 
kf in heteropolymers obtained by circular permutations 
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FIG. 13: Permutation protocol. Left: Unstructured interme- 
diate configuration after circularly permuting S2 by a = —10. 
The numbers indicate the initial position of the monomers 
in S2. Right: Chevron plots for circular permutations of S2 
by a. A = 1.2 ksT*. Inset: Critical forces as a function of 
a. The vertical dashed lines indicate a values where the core 
dissociates. 



of S2. We find, again, transitions when the circular per- 
mutation dissociates the core. The two transitions are 
found at both edges of the core (right and left, see Fig. 
I13p and are characterized by a sudden drop in the critical 
force. 

We have finally investigated an experimental protocol 
in which we change the location of the applied force along 
the chain. In this case, the presence of undesired interac- 
tions involving the monomers that are not pulled by the 
force makes difficult the analysis of the traces. The traces 
are indeed noisy due to the formation of new states that 
are very unlikely when stretching from the very ends of 
the heteropolymer. The important problem about how 
mechanical unfolding depends on the location of the force 
entails a more detailed study that we do not pursue here. 



III. CONCLUSION 

In many respects designed on-lattice heteropolymers 
are crude approximations to real proteins. Yet, it seems 
again, that these models share common features with the 
folding of single-domain proteins. In particular these 
models seem appropriate to investigate the three-state 
behavior that has been recently observed in single-protein 
force experiments [ll|. What is the link between the 
present lattice model results and real wild-type proteins? 
It is important to make clear the limitations of the cur- 
rent approach. Although most of our study has been in- 
spired by recent results in RNaseH it remains a challenge 
to establish a clear connection between the native three 
dimensional structure of a real protein and the topology 
of the native structure used in heteropolymer models. 
Let us stress that lattice models arc phenomenological 
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models useful to design specific free energy landscapes ca- 
pable of reproducing different kinetic scenarios for folding 
(e.g two-states, three-states, intermediate states on/off 
pathway, correlated/uncorrelated early and intermediate 
states and so on). From this perspective we expect that 
the phenomenology described here is quite general and 
probably observed in proteins other than RNase H. 

Interestingly, the stabilization by mechanical force of 
a unique intermediate state suggests possible ways to 
experimentally infer its structure. We have found that 
this state is composed of an unstructured and stretched 
part of the polypeptide chain plus a rigid core that cor- 
responds to some part of the native state. This result 
might be specific to the details of the model, yet the 
competition between different types of low entropy re- 
gions along the polypeptide chain (a compact core ver- 
sus an extended chain) could be reasonably argued to be 
the generic driving force for the formation of unstruc- 
tured extended chains. It must be emphasized, however, 
that our model does not include side chain interactions. 
These are known to induce a large entropy loss upon 
folding due to the excluded volume interactions present 
in the packed native state [3l|, H^] ■ Therefore we cannot 
exclude a scenario where the large entropy of the side 
chains might induce a molten globule like intermediate 
state in force where the protein keeps a single native-like 
core with freely moving side chains [33|. Our simulations 
also reveal (see Fig. SIO) that the presence of a rigid 
core is not necessarily correlated with the hydrophobic- 
ity of the monomers in the chain. This suggests that, 
although amino acid composition can facilitate the for- 
mation of a core, an excess of hydrophobic monomers is 
not a necessary requirement for its formation. 

Although real proteins are too complex to be modeled 



with "beads and sticks in regular lattices", these mod- 
els are useful to infer possible experimental protocols to 
probe the intermediate state. The experimental proto- 
cols we propose in this work (point mutations, cutting 
the polypeptide chain and circular permutations) are well 
known in protein biophysics and could be used to distin- 
guish between a molten globule and an unstructured ex- 
tended state. Indeed, if these modifications of the protein 
lead to the same types of transitions as in Fig. [m[T2l[T3l 
it is likely that the intermediate state is composed of a 
core plus an unstructured extended chain. In the case of 
a uniform dependence of the rates this would suggest that 
the intermediate state resembles more a molten-globule 
structure where no rigid core is present. More generally, 
these techniques could be applied to precisely determine 
the location of the disordered and ordered domains in 
unstructured proteins. 

Finally we have shown that force measurements can 
also be used to highlight the presence of misfolded states, 
and to quantify the relative fraction of on/off-pathway 
trajectories. From this perspective force measurements 
suggest the possibility of probing the shape of the free 
energy landscape in proteins and investigatin g th e glassy 
behavior of proteins at low temperatures U, 1^, Hg] . 
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